Skip to content

fix(markdown_parser): paragraph with trailing hard break absorbs following blank line#9931

Merged
ematipico merged 1 commit intobiomejs:mainfrom
jfmcdowell:fix/9857-md-paragraph-hard-break
Apr 12, 2026
Merged

fix(markdown_parser): paragraph with trailing hard break absorbs following blank line#9931
ematipico merged 1 commit intobiomejs:mainfrom
jfmcdowell:fix/9857-md-paragraph-hard-break

Conversation

@jfmcdowell
Copy link
Copy Markdown
Contributor

@jfmcdowell jfmcdowell commented Apr 12, 2026

Note

This PR was created with AI assistance (Claude Code).

Summary

Fixes #9857.

When a paragraph ends with a hard line break ( \n) followed by a blank line and another paragraph, the parser incorrectly merged both paragraphs into a single MD_PARAGRAPH node. The root cause: MD_HARD_LINE_LITERAL already consumes the line-ending newline, so the following NEWLINE token is the blank-line separator, but the existing inline loop did not recognize it as a paragraph boundary.

The fix hoists after_hard_break state across loop iterations and breaks the paragraph when a bare NEWLINE follows a hard line break. Container continuations (blockquotes, list items) are unaffected because their continuation tokens (>, indent) appear before any NEWLINE.

Also simplifies the adjacent whitespace-trivia consumption by removing a redundant outer guard.

Test Plan

  • just test-crate biome_markdown_parser
  • just test-markdown-conformance
  • just test-crate biome_markdown_formatter
  • just lint

Docs

N/A

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 12, 2026

⚠️ No Changeset found

Latest commit: 6b0765f

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added A-Parser Area: parser A-Formatter Area: formatter L-Markdown Language: Markdown labels Apr 12, 2026
@jfmcdowell jfmcdowell force-pushed the fix/9857-md-paragraph-hard-break branch from 4a162df to 6b0765f Compare April 12, 2026 05:34
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Apr 12, 2026

Merging this PR will not alter performance

✅ 28 untouched benchmarks
⏩ 228 skipped benchmarks1


Comparing jfmcdowell:fix/9857-md-paragraph-hard-break (6b0765f) with main (19ff706)

Open in CodSpeed

Footnotes

  1. 228 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@jfmcdowell jfmcdowell marked this pull request as ready for review April 12, 2026 05:57
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 12, 2026

Walkthrough

This PR fixes the markdown parser's handling of hard line breaks (MD_HARD_LINE) followed by empty lines. Previously, the parser incorrectly absorbed empty lines and subsequent paragraphs into a single MD_PARAGRAPH node. The fix introduces an after_hard_break state variable in parse_inline_item_list that tracks when a hard line has been parsed, causing the parser to treat the next bare NEWLINE as a paragraph boundary separator rather than inline content. Test fixtures validate both scenarios: hard breaks followed by empty lines and those without.

Possibly related PRs

Suggested reviewers

  • dyc3
  • ematipico
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the core fix: addressing a paragraph parser bug where hard line breaks incorrectly absorb following blank lines.
Description check ✅ Passed The description clearly explains the bug, root cause, and fix strategy, relating directly to the changeset and test plan.
Linked Issues check ✅ Passed The code changes fully address issue #9857 by hoisting the after_hard_break state across loop iterations and treating bare NEWLINEs as paragraph boundaries after hard line breaks, with test fixtures validating the fix.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing the paragraph parsing logic and adding corresponding test fixtures; no unrelated modifications are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/biome_markdown_parser/src/syntax/mod.rs (1)

1083-1099: ⚠️ Potential issue | 🟠 Major

Synchronise inline pre-scan with the new hard-break boundary rule.
Line 1097 now terminates on a bare NEWLINE after a hard break, but inline_list_source_len/scan_newline_in_inline_list don’t track that state. This can overrun the emphasis context into the next paragraph and skew delimiter matching.

💡 Suggested fix sketch
fn inline_list_source_len(p: &mut MarkdownParser) -> usize {
    let start: usize = p.cur_range().start().into();
    p.lookahead(|p| {
        let mut has_content = false;
+       let mut after_hard_break = false;

        loop {
            if p.at(T![EOF]) {
                break;
            }

            if p.at(NEWLINE) {
+               if after_hard_break {
+                   break;
+               }
                if scan_newline_in_inline_list(p, has_content) {
                    break;
                }
+               after_hard_break = false;
                continue;
            }

+           if after_hard_break
+               && p.at(MD_TEXTUAL_LITERAL)
+               && p.cur_text().chars().all(|c| c == ' ' || c == '\t')
+           {
+               p.bump(MD_TEXTUAL_LITERAL);
+               continue;
+           }
+
            if !p.cur_text().chars().all(|c| c == ' ' || c == '\t') {
                has_content = true;
            }

+           after_hard_break = p.at(MD_HARD_LINE_LITERAL);
            p.bump(p.cur());
        }

        let end: usize = p.cur_range().start().into();
        end.saturating_sub(start)
    })
}

Also applies to: 1145-1149

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/syntax/mod.rs` around lines 1083 - 1099, The
inline pre-scan must respect the new hard-break boundary: modify
inline_list_source_len and scan_newline_in_inline_list so they are aware of the
after_hard_break condition (or accept a flag) and stop scanning when a bare
NEWLINE follows a hard break, mirroring the loop in mod.rs that breaks on
NEWLINE when after_hard_break is true; update calls to
inline_list_source_len/scan_newline_in_inline_list from the parser loop (where
after_hard_break is set) to pass the state and ensure emphasis delimiter
matching does not continue past that boundary.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@crates/biome_markdown_parser/src/syntax/mod.rs`:
- Around line 1083-1099: The inline pre-scan must respect the new hard-break
boundary: modify inline_list_source_len and scan_newline_in_inline_list so they
are aware of the after_hard_break condition (or accept a flag) and stop scanning
when a bare NEWLINE follows a hard break, mirroring the loop in mod.rs that
breaks on NEWLINE when after_hard_break is true; update calls to
inline_list_source_len/scan_newline_in_inline_list from the parser loop (where
after_hard_break is set) to pass the state and ensure emphasis delimiter
matching does not continue past that boundary.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 71398d5b-4631-477c-804d-7493537f7a79

📥 Commits

Reviewing files that changed from the base of the PR and between 19ff706 and 6b0765f.

⛔ Files ignored due to path filters (2)
  • crates/biome_markdown_formatter/tests/specs/markdown/hard_line.md.snap is excluded by !**/*.snap and included by **
  • crates/biome_markdown_parser/tests/md_test_suite/ok/hard_line_break_paragraph_split.md.snap is excluded by !**/*.snap and included by **
📒 Files selected for processing (3)
  • crates/biome_markdown_parser/src/syntax/mod.rs
  • crates/biome_markdown_parser/tests/md_test_suite/ok/hard_line_break_paragraph_split.html
  • crates/biome_markdown_parser/tests/md_test_suite/ok/hard_line_break_paragraph_split.md

@ematipico ematipico merged commit 1aa85f5 into biomejs:main Apr 12, 2026
17 checks passed
@jfmcdowell jfmcdowell deleted the fix/9857-md-paragraph-hard-break branch April 13, 2026 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Formatter Area: formatter A-Parser Area: parser L-Markdown Language: Markdown

Projects

None yet

Development

Successfully merging this pull request may close these issues.

biome_markdown_parser: paragraph containing hard line breaks incorrectly absorbs following empty line and paragraph

2 participants